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Asymptotically Optimal Planning by Feasible 
Kinodynamic Planning in State-Cost Space 
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Abstract —This paper presents an equivalence between feasible 
kinodynamic planning and optimal kinodynamic planning, in 
that any optimal planning problem can be transformed into a 
series of feasible planning problems in a state-cost space whose 
solutions approach the optimum. This transformation gives rise 
to a meta-algorithm that produces an asymptotically optimal 
planner, given any feasible kinodynamic planner as a subroutine. 
The meta-algorithm is proven to be asymptotically optimal, and 
a formula is derived relating expected running time and solution 
suboptimality. It is directly applicable to a wide range of optimal 
planning problems because it does not resort to the use of steering 
functions or numerical boundary-value problem solvers. On a 
set of benchmark problems, it is demonstrated to perform, using 
the EST and RRT algorithms as subroutines, at a superior or 
comparable level to related planners. 

1. Introduction 

Optimal motion planning is a highly active research topic 
in robotics, due to the pervasive need to compute paths 
that simultaneously avoid complex obstacles, satisfy dynamic 
constraints, and are high quality according to some cost 
function. Recent advances in sampling-based optimal motion 
planning build on decades of work in the topic of feasible 
motion planning, in which costs are ignored. However, the 
field is still some ways away from general-purpose optimal 
planning algorithms that accept arbitrary black-box constraints 
and costs as input. In particular, optimality under kinematic 
and differential constraints remains a major challenge for 
sampling-based planners. 

This paper presents a new state-cost space formulation 
that transforms optimal motion planning problems into fea¬ 
sible kinodynamic (both kinematically- and differentially- 
constrained) motion planning problems. Using this formu¬ 
lation, we introduce a meta-algorithm, AO-x, to adapt any 
feasible kinodynamic planner x into an asymptotically-optimal 
motion planner, provided that x satisfies some relatively un- 
restrictive conditions, e.g., expected running time is finite. 
The meta-algorithm accepts arbitrary cost functions, including 
non-differentiable ones, and handles whatever kinematic and 
differential constraints are handled by the underlying feasible 
planner. 

The formulation is rather straightforward: n-dimensional 
state is augmented with an auxiliary cost variable, which 
measures the cost-to-come (i.e., accumulated cost from the 
start state), yielding a (n + 1)-dimensional, dynamically- 
constrained feasible problem in (state, cost) space (Fig. [T]). 
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Fig. 1. The state-cost space for a 2D, kinematically constrained problem with 
a path length cost function. State-cost space is 3D, with a conical reachable 
set with an apex at the start configuration (green circle). Two paths to the 
same state-space target (red filled circles) follow trajectories that arrive at 
different points in state-cost space (red open circles). 

The meta-algorithm proceeds by generating a series of feasible 
trajectories in state-cost space with progressively lower costs. 
This is accomplished by first generating a feasible trajectory 
in state space, then progressively shrinking an upper bound on 
cost according to the cost of the best path found so far. This 
meta-algorithm is proven to converge toward an optimal path 
under relatively unrestrictive conditions. 

The AO -X meta-algorithm is demonstrated on practical 
examples using the RRT flSl and EST |(9l algorithms as 
subroutines for feasible kinodynamic planning. Due to prior 
theoretical work on the running time of EST, we are able to 
prove that the expected running time of the meta-algorithm 
is 0(e“^ lnlne“^), where e is the solution suboptimality. 
Critically, this is one of the few asymptotically-optimal plan¬ 
ners that exclusively uses control-sampling to handle dynamic 
constraints, rather than resorting to a steering function or a 
numerical two-point boundary value problem solver. The new 
method outperforms prior planners in several toy scenarios in¬ 
cluding both dynamic constraints and complex cost functions. 

II. Background and Related Work 

Optimal motion planning has been a topic of renewed 
activity in robotics largely due to the advent of sampling- 
based motion planners that are proven to be asymptotically 
optimal csi. But the community has had a long history of 
interest in optimal motions. Numerical trajectory optimization 
techniques O |3l [22]| have been long studied, but have 

several drawbacks. First, they are prone to falling into lo¬ 
cal minima, and second they typically require differentiable 
constraint and cost representations, which are often hard to 
produce for complex obstacles. Grid-based planners EiniEa 
are often fast in low dimensional spaces but suffer from 
the “curse of dimensionality,” with performance degrading 
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rapidly in spaces of higher dimension 1^ . Sampling-based 
planners were originally developed to overcome many of these 
challenges, and have been shown to have excellent empirical 
performance in finding feasible paths in high-dimensional 
spaces, both without oa and with dynamic constraints Eiiia. 
However, they tend to produce jerky paths that are far from 
optimal. Some hybrid approaches have combined sampling- 
based planning with local optimization to produce better 
paths 11911251. 

More recently, sampling-based optimal planners like RRT* 
produce asymptotically-optimal paths whose costs converge 
in expectation toward the optimal path. The insight is that 
optimal paths can be obtained by judiciously “rewiring” a tree 
of states to add connections that reduce the cost to a given 
node in the tree. However, this requires a steering function, a 
method to produce a curve between two states that is optimal 
when obstacles are ignored. In systems without dynamic 
constraints, this is as simple as generating a straight line. 
But steering functions for dynamically-constrained systems are 
much harder to come by. 

Several authors have extended RRT* to dynamically- 
constrained systems. It is relatively easy to apply RRT* to 
dynamically-constrained systems if a steering function is avail¬ 
able ca. Proving convergence is harder, requiring analysis of 
small-time controllability conditions ifTTI . Other authors have 
extended RRT* to systems whose dynamics and costs are 
(or can be approximated) by linear and quadratic functions, 
respectively, by definition of a suitable steering function 
based on the LQR principle |[2T] |26l. When more complex 
differential constraints are involved, it may not be possible 
to devise a suitable steering functions. One method generated 
complex maneuvers using RRT* and performed each rewiring 
step by numerically solving a two-point boundary value 
problems cni. This adds greatly to computational expense. 
A similar method performed rewiring using a spline-based 
trajectory representation that is optimized via a nonlinear 
program solver 1^ . 

The prior work with closest relation to ours in terms of 
generality of applicability is the Sparse-Stable-RRT plan¬ 
ner CD. Like our work, it avoids the use of a steering function 
entirely and samples directly in control space. Approximate 
rewiring is performed by allowing connections to points 
that are “near enough” according to a state-space distance 
metric. This scheme was proven to satisfy asymptotic near¬ 
optimality, which is the property of converging toward a path 
with bounded suboptimality 0. In a more recent paper, the 
same authors have extended it to an asymptotically-optimal 
planner, SST*, by progressively shrinking nearness threshold 
parameters (m. However, Sparse-Stable-RRT and SST* and 
have many parameters to tune, and our experiments suggest 
that AO-x planners in general outperform both planners. 

We note the similarity of algorithm to Anytime-RRT (71, 
except that the planner uses state-cost space rather than 
simply state space (and has fewer parameters to tune). Hence, 
Anytime-RRT does not obey any theoretical asymptotic- or 
near-optimality guarantees. This is critical in practice, as our 
experiments suggest Anytime-RRT tends not to converge to 
an optimum. 


III. Theoretical Formulation 

This section presents the state-cost space formula¬ 
tion, the meta-algorithms, and theoretical results regarding 
asymptotically-optimality. 


A. Terminology 

First we define key concepts of feasible, optimal, and 
boundedly-suboptimal planning problems, as well as complete, 
probabilistically complete, and asymptotically optimal plan¬ 
ners. Let X denote the state space. 

Definition 1. A feasible (kinodynamic) planning problem 
P = {X^U^xi^G, F, D) asks to produce a trajectory 
y{s) : [0, S] ^ X and control u{s) : [0, S] U such that: 


2 /( 0 ) = xi 

2/(1) gGCX 

2 /(s) eFCX Vs G [0, S'] 

u{s)eB{y{s)) VsG[0,S] 


(initial state) (1) 

(goal state) (2) 

(kinematic constraints) 

(3) 

(control constraints) (4) 
y'{s) = D(y(s)^u{s))\/s G [O^S] (dynamic equation) (5) 


This is a highly general formulation. Note that kinematic 
planning problems can simply set the control variable u{s) 
to the derivative of the path, the control set io B = 
{n I ||t^||< 1}, and the dynamic equation as D(y^u) = u. 
Second-order planning problems (i.e., those with inertia) can 
be defined with a configuration x velocity state x = (q^q). 
Problems with time-variant constraints can be constructed in 
this same form by augmenting the state variable with the time 
variable (x,t). 

Definition 2. An optimal planning problem 
P = (L, A, f/, X/, G, F, F, D) asks to produce a trajectory 
y{s) : [0, F] ^ X that minimizes the objective functional: 

C{y) = r Liyis), uis))ds + i>( 2 /(S)) (6) 

Jo 

among all feasible trajectories (those that satisfy (1-5)). Here 
L is the incremental cost and 4> is the terminal cost. 

Definition 3. A bounded-suboptimality planning problem 
(P, e) asks to find a trajectory satisfying C{y) = G* + e, 
where G* is the cost of the optimal path and e > ^ is a 
specified parameter. 

Definition 4. A complete planner A finds a feasible solu¬ 
tion to a problem P when one exists, and terminates with 
‘failure’’ if one does not. Moreover, it does so in finite 
time. A probabilistically-complete planner A finds a feasible 
solution to a problem P, when one exists, with probability 
approaching 1 as more time is spent planning. A planner A is 
asymptotically-optimal for the optimal planning problem P if 
the cost C{t) of the generated path approaches the optimum 
G* with probability 1 as more time t is spent planning. 
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B. State-cost space equivalence 

Our first contribution is to demonstrate an equivalence of 
any optimal planning problem with that of a canonical state- 
cost form in which the dependence on the incremental cost L 
is eliminated. In particular, we augment each state x with the 
cost c taken to reach it from xj to derive an expanded state 
^ = (x, c). 


Theorem 1. The optimal planning problem 
P = (L, X, f/, X/, G, F, 5,1)) is equivalent to a state-cost 
optimal planning problem without incremental costs 

P = (0, X X i?+, U, {xi, 0), GxR+,Fx R+, B, D), 


in such a way that solutions to P are in one-to-one correspon¬ 
dence with solutions to P. Here the terminal cost is given 
by 

X 




= c + ^{x) 


(7) 


And the dynamics D are given by 


x' 


D (x, u) 

1- 


L (x, u) 


The proof is straightforward, showing that the projection of 
solutions to P onto the first dim{X) elements are solutions 
to P, and solutions to P can be mapped to solutions to P 
via augmenting them with a cumulative cost dimension. Note 
that even if every state in the original space was reachable, 
not all points in the state-cost space are reachable. Moreover, 
the goal set is now a cylinder with infinite extent in the cost 
direction (Fig. [^a). 

Using this state-cost transformation, we derive a corollary 
that states that a boundedly-suboptimal problem with cost at 
most c can be solved by solving a feasible planning problem 
(Fig. 0b). 


Corollary 1. A bounded-suboptimality planning problem P 
with e = c — G* is equivalent to a feasible planning problem 
Pc = {X X P+,U,(x7,0),Gc,P X P+,P,P) where Gc = 
{(x, c) I X G G, c G [0, c — is the set of terminal state- 

cost pairs satisfying the goal condition and the cost bound, 
and D is given by the state-cost transformation. 


Specifically, if y is a solution to Pc, then it corresponds to 
a feasible solution of P with cost no more than c (and no less 
than G*). Also, Pc has no solution if and only if c < G*. 


C. Bounded-suboptimality meta-planning with a complete fea¬ 
sible planner 

The above corollary suggests that bounded-suboptimality 
planning is equivalent to feasible kinodynamic planning; how¬ 
ever, G* is a priori unknown. Hence, we present a bounded- 
suboptimality meta-planner that repeatedly invokes a feasible 
planner while lowering an upper bound on cost. This idea 
builds some intuition for the asymptotically-optimal planner 
presented in the following section. 

The meta algorithm Bounded-Suboptimal(P,e,A) accepts as 
input a problem P, a tolerance e, and a complete feasible 
planning algorithm A, and is listed as follows: 


Algorithm 1 Bounded-Suboptimal(P,e,A) 

1: Run A (Poo) to obtain a first path yo. If no solution exists, 
then report 'P has no solution’. 

2: Let Co = C{yo). 

3: for i = 1, 2,... do 

4: Run A{Pc._^-e) to obtain a new solution yi. If no 

solution to Pc^_^-e exists, then stop. 

5: Let Ci = C{yi). 


Step 1 solves for a feasible solution to the original problem, 
with no limit on cost. In practice, it may be solved in the 
original state space simply by discarding the cost function. 
In the loop. Step 4 establishes a new cost upper bound by 
lowering the best cost found so far c^_i by e. The following 
theorem proves correctness of this meta-algorithm. 

Theorem 2 . If A is a complete planner for the feasible kin¬ 
odynamic planning problem, then Bounded-Suboptimal(A,e) 
terminates in finite time and produces a path yi with cost no 
more than G* + e. 


Proof: Let i + 1 be the index of the final iteration. In the 
prior iteration, a solution was found, so c^ > G*. In the current 
iteration, no solution is found, and since A is complete, the 
current planning problem, Pa-e is infeasible. So, c^ — e < G*, 
and therefore Ci = C{yi) is within e of optimal. 

Running time is finite since each loop reduces the cost by at 
least e, and hence the inner loop is run no more than 
times. 


D. Asymptotically-optimal meta-planning with a randomized 
feasible planner 

The need for a complete planner is too restrictive for 
practical use on high-dimensional problems, where only prob¬ 
abilistically complete planners are practical. Here, we relax 
this restriction while also eliminating the dependence on the 
parameter e, under the unrestrictive assumption that the cost 
is lowered by a nonnegligible fraction whenever A finds a 
feasible path. 

We will need to make some assumptions such that A is 
“well-behaved” so that it has a significant chance of finding 
a path that shrinks the best cost found so far regardless of 
c. Given some cost upper bound c, define the cost of the 
next produced solution follow a cumulative density function 
(p{C{y);c). This function ranges from 0 to 1 on the support 
[G*,c], i.e., P{C{y) < z) = (p{z;P). We do not prescribe 
any form for this distribution, however, we do require one 
condition for its moment. 

Well-behavedness of A requires two conditions: 

1) If there exists a feasible solution and c > G*, then A 
terminates in finite time. 

2) Given a cost bound c expected suboptimality of the 
computed path is shrunk toward by a non-negligible 
amount each iteration. (In practice, this means that there 
is a nonzero chance that the planner does not produce 
the worst-possible path). 
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Fig. 2. (a) The goal region in state-cost space extends the state-space goal region G infinitely along the cost axis. The optimal cost C* is the least-cost 

portion of the reachable set that touches the region, (b) Given the cost of an existing path c, the optimal path is known to lie in the region of space with cost 
lower than c. Finding a trajectory that improves upon c is a feasible planning problem, (c) The reachable portion of this goal set shrinks as c approaches the 
optimum. 


Specifically, condition 2 requires that: 

E[C{y)\c] < (l-^)(c-C*) (9) 

for some ic > 0 (the w is required for technical reasons; for 
most cases this condition enforces that E[C{y)\c] < (c—C*)). 
This condition is not overly restrictive for most randomized 
planners; the set of paths with C{y) = c is a set with measure 
zero in the space of paths, and is unlikely to be sampled at 
random. 

We are now ready to present the main algorithm, AO-x. 


Algorithm 2 Asymptotically-optimal(P,A,n) 

1: Run A (Poo) to obtain a first path y^. If no solution exists, 
report 'P has no solution’. 

2: Let Co = C{yo). 

3: for i = 1, 2,..., n do 

4: Run A{Pc._^) to obtain a new solution yi. 

5: Lei Ci = C{yi). 

return y^ 


Theorem 3 . If A is a well-behaved randomized algorithm, 
then Asymptotically-optimal(A^n) is asymptotically optimal. 
In other words, as n approaches infinity, the probability that 
yn is not an optimal path approaches zero. 

Proof: Let Xq ,..., Xn be the nonnegative random vari¬ 
ables denoting the suboptimality C (yi) — C* during a run of 
the algorithm. We will show that they converge almost surely 
to the optimum as n increases. That is we want to show that 

P (^liTTlfl^QQ Xjl 0 ) 1 . 

Almost sure (a.s.) convergence is equivalent 
to lim„_>oo P (sup„>„ Xm> e) = 0. Since 

sup^>^X^ = Xn, a.s. convergence is implied by 
convergence in probability lim^^oo P{Xn > e) = 0 . To prove 
convergence in probability, we will prove lim^^oo E[Xn] = 0 
and then use the Markov inequality P{Xn > e) < E[Xn]/e. 


Conditioning on Xn-i, we get: 

E[Xn] = J E[Xn I Xn-l]PiXn-l)dXn-l 

and due to ([^ we have E[Xn \ Xn-i] < {l — w)xn-i. Hence, 

~ (1 Xn—lP{Xn—l^dXn—l ( 10 ) 

= (1 - w)E[Xn-i] = (1 - W^ElXo] 

and thus, 

P{Xn>e)<E[Xo]{l-wr/e (11) 

Clearly this approaches 0 as n increases. ■ 

E. Convergence rate with respect to time 

We now take a more detailed analysis of the case in which 
the feasible planner is probabilistically complete, and study the 
convergence of Asymptotically-Optimal in terms of running 
time t rather than the number n of planner calls. We show 
again, under relatively weak assumptions, that Asymptotically- 
Optimal is asymptotically optimal in terms of time, even 
though each call to the planner takes increasingly longer to 
complete as n increases because the reachable portion of the 
goal set shrinks (Fig. [^c). 

A planner is probabilistically complete if the probability that 
it finds a feasible path, if one exists, approaches 1 as more 
time is spent planning. Note that a probabilistically complete 
planner will not necessarily terminate if no feasible path exists. 

Note that probabilistic completeness is not a sufficient 
condition for a planner to be useful, since the convergence 
rate may be so slow that it is impractical. As an example, 
let A be a probabilistically complete planner, and f{t) denote 
P(A fails given t seconds of planning). If f{t) = 1/t, then 
expected running time is infinite. 

We will assume that for the given X, xj, E, and D the 
planner A satisfies an exponential convergence bound, in 
which f(t) < max(l,ae“^^) for some positive values a 
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and In practice, an exponential convergence bound implies 
expected running time is finite. 

a 

E[t] < tf{t)dt < - 

A more refined analysis lH gives a tighter bound 

pOO n(\na)/P nOO 

/ tf{t)dt = / tdt + / tae~^^dt 

Jo Jo J(\na)/P 1121 


P 1 - e-/5 

Convergence rate varies, however, depending on the reach¬ 
able portion of the goal region Gc H R{xj) where R{x) 
is the reachability set of x in state-cost-space (Fig. [^c). In 
particular, a small goal region makes it rare to A to sample 
a configuration in it at random, which slows convergence. 
Hence, the convergence rates are properly defined as a function 
of the volume of the goal region: 



Fig. 3. Running time for each invocation of the underlying planning 
subroutine increases asymptotically to infinity as the reachable goal volume 
decreases. The visibility characteristics of the underlying space also have 
major effects on running time. Plots illustrate the theoretical bounds on 
expected running time of EST for spaces of three different “difficulty” levels. 
Specifically, visibility constants a and /3 are set to 0.04, 0.02, and 0.01 for 
easy, medium, and hard spaces. It is assumed that 1,000 samples are generated 
per second. 


ct = a{fi{Gc n R{xi))),^ = ^{fi{Gc n R{xi))) (13) 

The following theorem gives an example of such a bound when 
the EST algorithm is used as the underlying feasible planner. 

Theorem. Assuming the space is expansive, the Kinody- 
namic EST planner (Hsu, Latombe, Kindel, and Rock 2001) 
satisfies an exponential convergence bound with constants 
a{g) = 7 ln ^ and ^{g) = dg for positive constants 7 and 5, 
where g is the volume of the reachable goal region. Moreover, 
E[t] is O Qlnln^^ as g approaches 0 . 

Proof: Erom Hsu, Latombe, Kindel, and Rock 2002 191 , 
Kinodynamic EST with a uniform sampling strategy fails 
to find a path with probability no more than p if at least 
- In — + - In - milestones are sampled, where k = 4ln- 
and a and [3 are expansiveness constants that are fixed for the 
given configuration space (not related to the a and P defined 
above). Using a bit of algebra, this expression can be rewritten 
as a bounded probability of failure: 

kg 2cx —tag 

f{t) < {2k)^3 + 2a .2kg + 2a . g feg + 2 a 


Here we have assumed that each sample takes constant time 
and the constant factor is ignored. 

Now we will simplify this rather unwieldy expression. Eirst, 
note that the exponents of the first two terms in the equation 
are upper bounded by 1 since they are ratios of two positive 
numbers to their sums, and hence 

f{t) ^ 4/c • e^9 + 2cx 


Next, we use the fact that \nx < x. Hence, /c = ^ In | < ^. 
The factor in the exponent can now be lower bounded 

by hence we have the desired expression 


f{t) < 7 lnie-*'^s 


(14) 


With 7=7 and 5 = 2 -v 2 ai 3 constant for a given configuration 
space. As a result, the running time is bounded by E [t] < 
In 7 -j-In In 4 + i-l-sg • As g shrinks, the latter term’s 


order of convergence is so we can conclude that E[t] is 

O ( ^Inln 4 ) as desired. ■ 

3) 

Eig. illustrates this bound. It is apparent that, since G 
is fixed by the original problem, Gc varies only with the 
parameter c. Hence we may also state these functions as a{c) 
and /3(c). In order for EST to be “well-behaved” as defined 
above, we must require that as c approaches C*, the volume 
of Gc n R{xi) is nonzero as long as c > C*. 

Let us now state our main result regarding AO-A, which is 
the planner defined as Asymptotically-Optimal(P, A, oc). 

Theorem 4 . If A is a probabilistically complete, exponentially 
convergent planner, then AO-A is asymptotically optimal in 
total running time t 

Proof: Define c(t) as the cost of the best path found so 
far in AO-A after time t has elapsed. We will show that it 
converges almost surely toward C* as t increases. Let Z{t) 
be the random variable denoting the suboptimality c{t) — C* 
during a run of Asymptotically-optimal(A, oc). We wish to 
prove that limt^cxD > e) = 0 for any e > 0. 

Let T{x) be the random variable denoting the time at which 
the cost of the best path found so far decreases below x+C*. It 
is evident that P {Z{t) > e) = P (T(e) > t). We wish to show 
that E[T{e)] < 00 , which would in turn imply the claim due 
to the Markov inequality P (T(e) >t)<E [T{e)] jt . 

If the current cost bound is greater than e, then the expected 
time to find a path for any iteration is upper-bounded by the 
expected time it would take to find a path to Gc^+e, which 
is some finite value since A is exponentially convergent. 
Specifically, = -^ for kinodynamic EST as shown above. 
So, if Asymptotically-optimal first finds a path with subop¬ 
timality no more than e on the i’th iteration, then the cost 
expended is no more than 

If we let N denote the random variable of the iteration on 
which Asymptotically-optimal first finds a path with subopti- 
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mality no more than e, then we can bound E [T(e)] as follows: 

CX3 CX3 

E [T(e)] < ^ iUP {I = i)=uY, iP {N = i) = t,E[N] 

i=0 i=0 

To show that E[N] is finite, we will take a variant of the 
proof of Theorem 3. Again let Aq, ..., be the nonnegative 
random variables denoting the suboptimality C (i/i) — C* 
during a run of the algorithm. N is the index of the first A^ 
that decreases below e. Hence, 

P{N <i) = P{Xi < e) 


and 


P(A = i)= P{N < i) - P{N < i - 1) 


(15) 


= P{Xi < e) - P(Xi_i < e) 

We will use the cumulative probability function definition: 

P{Xi < z\Xi-i) = + z;C* + Xi_i) 

and begin by conditioning on Xj_i. 

POO 

P{Xi<e)= / P{Xi < e|Xi_i)dP(Xi_i) 

Jo 

= rp{Xi<€\ 

Jo 

/ CO 

P{Xi<€\Xi_,)dP{Xi_,) 

PC pCO 

= l-dP{Xi_i)+ ifiC* +e-,c* +Xi_i)dP{Xi_i) 

/ CO 

^{C*+e; C* + Xi_i)dP{Xi_i) 


(16) 


Hence, 


/ CX) 

<p{C* + e-, C*+Xi_i)dP(Xi_i) 

/ C» 

dP(A,_i)=P(A,_i >e) (17) 

Where we have applied the bound (p{x;y) < 1 which holds 
because ip is a. CDF. 

We can now apply equation ( pTj ) derived in Theorem 3 to 
obtain P(A = i) < E [Aq] {l — wy~^/e where (l — w) < 1 is 
defined as before. This upper bound has the form of a geomet¬ 
ric distribution with parameter w. So, E[N] < P[AQ]/e 
which is finite. ■ 

To be specific, if we were to use the exponential conver¬ 
gence bound for kinodynamic EST, we may conclude that 
E[T{e)] isO(^lnlni). 

We note that this convergence bound is rather loose; earlier 
iterations will likely terminate much faster than p. 


F. Complexity Discussion 

The computational complexity of AO-x is affected by 
several aspects of problem structure. As remarked before, the 
visibility characteristics of the problem affect the running time 
of the feasible planning subroutine x. As a result, the optimal 


parameters of x, such as the expansion distance in RRT, are 
problem dependent. 

We also note that planner performance in state space 
may be different from performance in (state, cost) space. 
Adding a dimension of cost may increase both time and 
space complexity, and it also adds drift to problems that may 
originally be driftless. We note, however, that the control space 
remains unchanged, and the performance of many planners are 
governed chiefly by control complexity. 

Lastly, we observe that problem dimensionality does not 
have a direct relationship to the order of convergence of AO- 
X. However, it does have a large impact in the running time 
of X, which is manifest in the terms and w in the proof 
above. Problems of higher dimension will tend to have a larger 
value of the term although it is easy to construct hard low 
dimensional problems. The expected cost reduction w is also 
dimensionality-dependent; for example, if the reachable goal 
region in (state, cost) space is locally shaped at the optimum 
like a convex cone of dimension d, then a goal configuration 
sampled at random will achieve an average cost reduction of 
0(l/(d+ 1)). 

IV. Implementations and Experiments 

This section describes the application of AO-x to several 
example problems using the feasible kinodynamic planners 
EST and RRT. We will refer to the implementations as AO- 
EST and AO-RRT. All planners are implemented in the Python 
programming language, and hence could be sped up greatly 
by the use of a compiled language. 

A. Implementations using RRT and EST 

Both kinodynamic EST and kinodynamic RRT are tree¬ 
growing planners that perform random extensions to a state- 
space tree, rooted at the start, by sampling a node in the tree 
and a control at random, and then integrating the dynamics 
forward over a short time horizon. They differ by sampling 
strategy. EST attempts to sample an extension so that its 
terminal state is uniformly distributed over the reachable set of 
the current tree. RRT attempts to sample an extension so that 
it is pulled toward a randomly-sampled state in state space (a 
Voronoi bias). Both methods can also incorporate goal biasing 
strategies to avoid excessive exploration of the state space in 
directions that are not conducive to reaching the goal. 

EST Implementation. EST can be applied directly to 
state-cost planning. To approximate sampling over a uniform 
distribution over the tree’s reachable set, it samples extensions 
with probability proportional to the inverse density of existing 
states in the tree. We use the standard method to approximate 
density by defining a grid of resolution h and low dimension k 
over randomly chosen orthogonal projections of the state-cost 
space. The density of a state x is estimated as proportional 
to the number of nodes in the tree N{x) contained in the 
same grid cell as x. In a manner similar to locality sensitive 
hashing, we choose several grids and count the total number 
of nodes sharing the same cell as x across all grids. Eor 
our experiments, we use grids, h = 0.1, and 

k = 3, and scale the configuration space A to the range 
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Fig. 4. The Kink and Bugtrap problems ask to find the shortest path between 
the two indicated configurations. 


Jo ijd 2 m(x)+i performing the random projection. To 

extend the tree we sample 10 candidate extensions by choosing 
10 source states uniformly from the set of occupied grid 
cells, and drawing one random control sample. Among those 
extensions that are feasible, we select one with probability 
proportional to 1 /{N(xf) where Xf is its terminal state. 

RRT Implementation. RRT can also be applied almost 
directly, but there are some issues to be resolved regarding 
the definition of a suitable distance metric. RRT relies on 
a distance metric to guide the exploration toward previously 
unexplored regions of state space, and is rather sensitive to 
the choice of this metric, with better performance as the 
metric approximates the true cost-to-go. However, cost-to- 
go is usually difficult to estimate accurately particularly in 
the presence of complex obstacles and dynamic constraints. 
Below, we empirically investigate the effects of the distance 
metric. Nearest node selection is accelerated using a KD-tree 
data structure. 

Performance considerations. In both cases, rather than 
planning from scratch each iteration, we maintain trees from 
iteration to iteration, which leads to some time savings. We 
also save time by pruning the portion of the tree with cost 
more than c whenever a new path to the goal is found. 
Specifically, smaller trees make EST density updates and RRT 
nearest neighbor queries computationally cheaper, although 
RRT benefits more from this optimization because a larger 
fraction of its running time is spent in nearest neighbor queries. 
We also prune more aggressively if a heuristic function h{x) 
is available. If h{x) underestimates the cost-to-go, then we 
can prune all nodes such that c ^ h{x) > c. Other sampling 
heuristics could also be employed to bias the search toward 
low-cost paths O. 

B. Example Problems 

Kink. The Kink problem (Fig. ia) is a kinematically- 
constrained problem in a unit square [0,1]^ in which the 
optimal solution must pass through a narrow corridor of width 
0.02 with two kinks. The objective is to minimize path length. 
Most planners very easily find a suboptimal homotopy class, 
but it takes longer to discover the optimal one. The maximum 
length of each expansion of the tree is limited to 0.15 units. 

Bugtrap. The Bugtrap problem (Fig.|^b) is a kinematically- 
constrained problem in a unit square [0,1]^ that asks the robot 


to escape a local minimum. The objective function is path 
length. This is a challenging problem for RRT planners due to 
their reliance on the distance metric as a proxy of cost. The 
maximum length of each expansion of the tree is limited to 
0.15 units. 

Dubins. This problem asks to move a standard Dubins car 
sideways while keeping orientation relatively fixed (Fig. [^. 
The state is {x,y,0) and the control is (t;,^) where 0 is 
the heading, v is the forward velocity, and (j) is the steering 
angle. State constraints include {x,y) e [0,1]^, V e {-l,+l}, 
and 0 G [—7r,7r]. For planning, time steps are drawn at ran¬ 
dom from [0,0.25]s. The metric is d{{x,y,0), {x',y',0')) = 
\/{x — x'Y + ~ y'Y + doiO^ / (27r) where dg mea¬ 
sures the absolute angular difference. The goal is to move the 
car sideways 0.4 units with a tolerance of 0.1 units in state 
space, with minimal execution time (equivalent to minimum 
path length). 

Double Integrator. This asks to move a point with bounded 
velocities and accelerations to a target location. The state space 
includes x = {q^v) includes configuration q and velocity v, 
with constraints q G [0,1]^, v G [—1, —1]^, and u G [—5,5]^, 

with q = V and v = u. The start is 0.06 units from the 

left and the goal is 0.06 units from the right, which must be 
reached with a tolerance of 0.2 units in state space. Distance 
is euclidean distance. Time steps are drawn from [0, 0.05] s. 

Pendulum. The pendulum swing-up problem places a point 
mass of m =1 kg at the end of a L =1 m massless rod. The 

state space is x = The goal is the set of states such that 

the rod is within 10° of inverted and absolute angular velocity 
less than 0.5 rad/s, and the cost is the total time required to 
complete the task. We take gravitational acceleration to be 
g =9.8 N-s^, and a motor can exert a torque at the fixed end 
of the rod with bang-bang magnitudes r G {—2,0,2} N-m. 
The dynamics of the system are described by: 


0 = uj 


(j = 


T — mg siii{0)L 
mL‘^ 


—9.8sin(6>) -f r 


(18) 

(19) 


The difficulty in this task arises from the fact that the exerted 
torque cannot make the pendulum complete a full rotation. In 
fact, the torque will be canceled by gravity at about 11.5°. 
Therefore, the only way to achieve an inverted position is 
to take the advantage of gravity by swinging back and forth 
and accumulating angular momentum. For planning, constant 
torques are applied for a uniformly chosen duration between 
0 and 0.5 s, and trajectories are numerically integrated using a 
time step of 0.01 s. Figure shows the first 5 paths obtained 
by AO-RRT. 

Flappy. We devised a simplified version of the once-popular 
game Flappy Bird. The “bird” has a constant horizontal 
velocity, and can choose to fall freely under gravity, or apply 
a sharp upward thrust. The trajectory is a piecewise-parabolic 
curve. In the original game the objective is simply to avoid 
obstacles as long as possible, but in our case we consider 
other cost functions. The goal is to traverse from the left 
of the screen to a goal region on the right. The screen 
domain is 1000 x 600 pixels with fixed horizontal velocity 
of Vx = bpx/s. The gravitational acceleration is ^ = Xpxj 









Fig. 5. Planning a sideways maneuver for a Dubins car using AO-RRT (top row) and AO-EST (bottom row). Numbers indicate total number of planning 
iterations. Green curve indicates best path found so far. Iteration counts are not directly comparable because RRT spends more time per iteration. 



Fig. 6. Plot of angle vs. time for the first five trajectories found by AO-RRT 
on the pendulum example. Shorter execution times (rightmost point on each 
curve) are preferred. The execution time decreases from 8.46 seconds in the 
first solution to 5.51 seconds in the fifth solution. (Best viewed in color) 


downward. The control u is binary, and provides an upward 
thrust of either 0 or ^pxjs^. Fig. shows an example solution 
path obtained by our planner. We represent the state by 


( 20 ) 


where (x, y) is the bird position and Vy is vertical velocity. The 
state evolves according io x = b, y = Vy, and Vy = — 1 + 
where u G {0,1} is the binary control. Time steps are sampled 
uniformly from the range [0,1], and the time evolution of 
the state is solved for analytically. The experiments below 
illustrate the ability of AO-x to accept unusual cost functions. 



Fig. 7. Example solution path for Flappy. The planner finds a path that goes 
from the starting point on the left to the green goal region on the right while 
avoiding obstacles represented by gray rectangles. 


C. Experiments 

Comparing AO-EST and AO-RRT. Fig. [^illustrates AO- 
EST and AO-RRT applied to the Car example. Qualitatively, 
RRTs tend to explore more widely at the beginning of plan¬ 
ning, while ESTs tend to focus more densely on regions 
already explored. As a result, in this example, AO-RRT finds 
a first path quicker, while AO-EST converges more quickly 
to the optimum (each iteration of EST is cheaper). Like 
in feasible planning, the best planner is largely problem- 
dependent, and we could find no clear winner on our other 
experiments. 

Benchmarking against comparable planners. We com¬ 
pare against the simpler meta-planner M-x which simply runs 
the feasible planner x multiple times, keeping the lowest-cost 
path found so far. We also experimented with a variant, M-x- 
Prune, which prunes search nodes whose cost is greater than 
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Fig. 8. Results of benchmark tests. Curves measure solution cost vs computation time, averaged over 10 runs. (Lower is better) 


the cost of the best path found so far. In the 2D problems, 
we compare against RRT* (121, and for fair comparison we 
provide the other RRT-based planners with the straight-line a 
steering function as well. We also compare against Anytime- 
RRT Cl and Stable-Sparse-RRT (SS-RRT) CS). We also 
compared SST* lIT^ . but it performed worse than SS-RRT 
in all of our tests. 

For fair comparison, all algorithms were implemented in 
Python using the same subroutines for feasibility checking, 
visibility checking, and distance metrics. All planners used 
the same parameters as AO-x where applicable. For Anytime- 
RRT we used e = 0.01 and (5c = 0.1, and found performance 
was relatively insensitive to these parameters. For SS-RRT, 
we used parameters 6bn = 0.1 and Sg = 0.03. Tuning of 
these parameters did not seem to have a consistent effect on 
performance. KD-trees were used for closest node selection 


in all of the RRT-based algorithms except Anytime-RRT, in 
which brute-force selection must be used because it does not 
select nodes using a true distance metric. 

Fig. ?? displays computation time vs solution cost, averaged 
over 10 runs for all of the benchmark problems. These results 
suggest that AO-EST consistently outperforms M-EST and M- 
EST-Prune, while AO-RRT sometimes outperforms M-RRT 
and M-RRT-Prune, but sometimes performs roughly the same. 
We find that Anytime-RRT and SS-RRT typically do not 
perform even as well as the simpler M — RRT algorithm, 
although Anytime-RRT did perform well on Elappy, and 
SS-RRT did perform well on Bugtrap. Surprisingly, RRT* 
performed excellently on Bugtrap but poorly on Kink despite 
the fact that it uses rewiring via a steering function. This drop 
in performance is explained by the fact that it spends excessive 
amounts of time building a detailed roadmap of the open 
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Fig. 9. The influence of cost weight in the RRT distance metric for the 
Pendulum example showing average cost (left) and planning time (right) over 
successive iterations. 



Fig. 10. Experiments comparing the cost weight on a planar kinematic 
problem suggests that estimating the true cost leads to faster convergence. 

homotopy class, rather than exploring the narrow passage. 

Overall, we observe that AO-EST is best or near-best 
performer in most problems. AO-RRT sometimes is the best 
performer, but is more inconsistent. A possible explanation is 
the well-known metric sensitivity of RRTs: when the distance 
metric becomes a poor approximation to cost-to-go, then RRT 
performance deteriorates. This property is inherited by AO- 
RRT. 

RRT distance metric. We empirically studied the influence 
of distance metric on planning time and quality for AO-RRT. 
For the pendulum example, we use a weighted Euclidean 
metric 

d{xi,X 2 ) = \/d0{ei,02)^ + (wi - W2)^ + Wc{ci - C 2 V 

( 21 ) 

where Wc trades off between the state-space distance and the 
cost-space distance. For each value of rcc = 0.1, 0.3, 1, 3, and 
10, we ran AO-RRT 10 times using a 60s time limit. Fig. 
shows that for this example, higher cost weights have a minor 
effect on solution cost but a detrimental effect on running time 
per iteration. 

We found a very different effect on a second problem. 
This one is a kinematically-constrained, planar minimum path 
length problem with obstacles. The “ideal” cost weight is 
1, since it perfectly measures the cost-to-go. Experiments in 
Fig. [rejustify this choice, showing that it converges quicker 
toward the optimum. 

Adaptation to different costs. Using the Flappy problem, 
we demonstrate the fast adaptability of the AO method to 
different cost functions, even those that are non-differentiable. 
AO-RRT is used here. First, we set cost equal to path length. 
The second cost metric penalizes the distance traveled only in 



Fig. 11. Convergence of Flappy with two different cost metrics. Brighter 
paths are of lower cost. Top: Penalizing path length. The first path is high 
cost (1866 px), passing through both upper openings, and eventually converges 
to a path that passes both lower openings (1234 px). Below: Penalizing low 
altitude paths. The first path passes through most of the lower openings (cost 
1330), and the planner converges to a path that passes through upper openings 
(cost 321). 

the lower half of the screen. The optimal path prefers high 
altitudes and passes through the two upper openings and one 
lower opening. Fig. [m shows the results. 

V. Conclusion 

This paper presents an equivalence between optimal motion 
planning problems (either kinodynamic or kinematic) and 
feasible kinodynamic motion planning problems using a state- 
cost space transformation. Despite the simplicity of the trans¬ 
formation, it is a powerful tool; we use it to develop an easily 
implemented, asymptotically-optimal, sampling-based meta¬ 
planner that accepts a sampling-based kinodynamic feasible 
planner as input. It purely uses control-based sampling, mak¬ 
ing it suitable for problems with general differential constraints 
and cost functions that do not admit a steering function. The 
expected convergence rate of the meta-planner is proven to be 
related to the goal-dependent running time of the underlying 
feasible planner. Using RRT and EST as feasible planning 
subroutines, we demonstrate that the proposed method attains 
state-of-the-art performance on a number of benchmarks. 

We hope this new formulation will provide inspiration 
and theoretical justiflcation for new approaches to optimal 
motion planning. As an example, an obvious way to improve 
convergence rate would be to run local optimizations on 
each trajectory found by the underlying planner; this method 
has been shown to work well for kinematic optimal path 
planning d . We also obtained curious results regarding state- 
space vs cost-space weighting in the RRT distance metric. 
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Following up on this work may also open up avenues of 

research in sampling strategies for state-cost space planning, 

e.g., in appropriate biasing strategies. 
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